PAPER[4,KMC]3 - www.SailDart.org

perm filename PAPER[4,KMC]3 blob sn#043952 filedate 1973-05-22 generic text, type T, neo UTF8
00100	
00200		IDIOLECTIC  LANGUAGE-ANALYSIS  FOR
00300		UNDERSTANDING DOCTOR-PATIENT DIALOGUES
00400	
00500		HORACE ENEA AND KENNETH MARK COLBY                  
00600	
00700		      INTRODUCTION
00800	
00900	
01000		Why is it so difficult for  machines  to  understand  natural
01100	language? Perhaps it is because machines do not simulate sufficiently
01200	what humans do  when  humans  process  language.   Several  years  of
01300	experience  with  computer  science  and  linguistic  approaches have
01400	taught us  the  scope  and  limitations  of  syntactic  and  semantic
01500	parsers.(Schank,Tesler  and  Weber  [6],  Simmons [7], Winograd [11],
01600	Woods [12]). While extant linguistic parsers  perform  satisfactorily
01700	with  carefully  edited  text  sentences or with small dictionaries ,
01800	they  are  unable   to   deal   with   everyday   language   behavior
01900	characteristic  of  human conversation.  In a rationalistic quest for
02000	certainty and attracted by  an  analogy  from  the  proof  theory  of
02100	logicians  in  which provability implied computability, computational
02200	linguists hoped to develop formalisms for natural language  grammars.
02300	But  the  hope  has not been realized and perhaps in principle cannot
02400	be.    (It is difficult to formalize something which  can  hardly  be
02500	formulated).
02600	
02700		Linguistic    parsers    use    morphographemic     analyses,
02800	parts-of-speech  assignments  and  dictionaries  containing  multiple
02900	word-senses each possessing semantic features, programs or rules  for
03000	restricting  word combinations.   Such parsers perform a word-by-word
03100	analysis of every word, valiantly disambiguating at each step  in  an
03200	attempt  to construct a meaningful interpretation.       While it may
03300	be sophisticated computationally, a conventional parser is quite at a
03400	loss  to  deal  with  the  caprice   of  ordinary  conversation.   In
03500	everyday discourse people speak colloquially and idiomatically  using
03600	all   sorts  of  pat  phrases,  slang  and  cliches.  The  number  of
03700	special-case expressions is indefinitely large.  Humans  are  cryptic
03800	and   elliptic.   They  lard  even  their  written  expressions  with
03900	meaningless  fillers  and  fragments.They convey their intentions and
04000	ideas in idiosyncratic  and  metaphorical  ways,  blithely  violating
04100	rules   of   'correct'   grammar   and   syntax.         Given  these
04200	difficulties, how is it that people  carry  on  conversations  easily
04300	most  of  the  time  while  machines thus far have found it extremely
04400	difficult to continue to make  appropriate  replies  indicating  some
04500	degree of understanding?
04600	
04700		It seems that  people  `get  the  message  '  without  always
04800	analyzing  every  single  word in the input. They even ignore some of
04900	its terms. People make individualistic and  idiosyncratic  selections
05000	from  highly  redundant  and  repetitious  communications.      These
05100	personal selective operations,  based  on  idiosyncratic  intentions,
05200	produce  a  transformation  of  the  input  by  destroying  and  even
05300	distorting information.  In speed reading, for example, only a  small
05400	percentage of contentive words on each page need be looked at.  These
05500	words    somehow    resonate    with     the     readers     relevant
05600	conceptual-belief   structure  whose  processes  enable  him  to
05700	`understand' not simply the language but  all  sorts  of  unmentioned
05800	aspects  about  the  situations  and  events being referred to in the
05900	language.      In written texts  up  to  5/6  of  the  input  can  be
06000	distorted  or deleted and the intended message can still successfully
06100	be extracted. Spoken conversations in English  are  known  to  be  at
06200	least  50%  redundant.   Half  the words can be garbled and listeners
06300	nonetheless get the gist or drift of what is being said.
06400	
06500		To approximate such  human  achievements  we  require  a  new
06600	perspective and a practical method which differs from that of current
06700	linguistic approaches.    This alternate approach should  incorporate
06800	those  aspects  of  parsers  which  have  been  found  to  work well,
06900	e.g.detecting  embedded  clauses.   Also   individualistic   features
07000	characteristic  of an idiolect should have dominant emphasis. Parsers
07100	represent complex and refined algorithms.  While  on  one  hand  they
07200	subject  a sentence to a detailed and sometimes overkilling analysis,
07300	on the other, they are finicky and oversensitive.     A  conventional
07400	parser may simply halt if a word in the input sentence is not present
07500	in its dictionary. It finds ungrammatical expressions such as  double
07600	prepositions  (`Do  you want to get out of from the hospital?') quite
07700	confusing.   Parsers constitute a tight conjunction of  tests  rather
07800	than  a  loose  disjunction.  As more and more tests are added to the
07900	conjunction, the parser behaves like a finer and finer  filter  which
08000	makes  it  increasingly  difficult for an expression to pass through.
08100	Parsers do  not  allow  for  ununderstandings  and  misunderstandings
08200	typical of everyday human dialogues.
08300		Finally, it is difficult to keep consistent a  dictionary  of
08400	over  500 multiple-sense words classified by binary semantic features
08500	or rules. For example, suppose a noun (Ni) is used by some verbs as a
08600	direct  object in the semantic sense of a physical object. Then it is
08700	noticed that Ni is also used  by  other  verbs  in  the  sense  of  a
08800	location  so  `location'  is added to Ni's list of semantic features.
08900	Now Ni suddenly qualifies as a direct  object  for  a  lot  of  other
09000	verbs.    But  the  resultant  combinations  make no sense even in an
09100	idiolect. If a special feature is then created for Ni, then one loses
09200	the  power of general classes of semantic features.   Adding a single
09300	semantic  feature  can  result   in   the   propagation   of   hidden
09400	inconsistencies and unwanted side-effect.. as the dictionary grows it
09500	becomes increasingly unstable and difficult to control.
09600	
09700	
09800		Early  attempts  to develop a pattern-matching approach using
09900	special-purpose  heuristics  have  been  described by Colby, Watt and
10000	Gilbert [1], Weizenbaum [9] and Colby and Enea[2]. The limitations of
10100	these  attempts are well known to workers in artificial intelligence.
10200	The  man-machine  conversations  of  such   programs   soon   becomes
10300	impoverished  and  boring. Such primitive context-restricted programs
10400	often grasp a topic well enough but too often do not understand quite
10500	what  is  being  said  about  the  topic,  with amusing or disastrous
10600	consequences. This shortcoming is a consequence of the limitations of
10700	a pattern- matching approach lacking a rich conceptual structure into
10800	which the pattern abstracted  from  the  input  can  be  matched  for
10900	inferencing.  These  programs  also lack a subroutine structure, both
11000	pattern directed and specific, desirable for generalizations.
11100		An   interesting   pattern-matching   approach   for  machine
11200	translation has been developed by Wilks [10]. His program  constructs
11300	a  pattern from English text input which is matched against templates
11400	in an interlingual data base from which,in  turn,  French  output  is
11500	generated without using a generative grammar.
11600	
11700		In the course of constructing an  interactive  simulation  of
11800	paranoia  we  were  faced  with  the  problem of dealing with natural
11900	language  as  it  is  used  in  the  doctor-patient  situation  of  a
12000	psychiatric  interview.This  domain  of discourse admittedly contains
12100	many psychiatrically stereotyped expressions and  is  constrained  in
12200	topics (Newton`s laws are rarely discussed). But it is rich enough in
12300	verbal behavior  to  be  a  challenge  to  a  language  understanding
12400	algorithm  since a variety of human experiences are discussed 
12500	domain including the interpersonal relation  which  develops  between
12600	the  interview  participants.  A  look at the contents of a thesaurus
12700	reveals that words relating to people and their  interrelations  make
12800	up at least 70% of language.
12900		The diagnosis of paranoia is made  by  psychiatrists  relying
13000	mainly  on  the  verbal  behavior of the interviewed patient.    If a
13100	paranoid model is to  exhibit  paranoid  behavior  in  a  psychiatric
13200	interview,  it  must  be capable of handling dialogues typical of the
13300	doctor-patient context.      Since the  model  can  communicate  only
13400	through   teletyped  messages,the  vis-a-vis  aspects  of  the  usual
13500	psychiatric interview are absent.    Thus the model should be able to
13600	deal  with  typewritten  natural language input and to output replies
13700	which are indicative of an underlying paranoid thought process during
13800	the episode of a psychiatric interview.
13900	
14000		In  an  interview there is always a who saying something to a
14100	whom with definite intentions  and  expectations.     There  are  two
14200	situations  to  be taken into account, the one being talked about and
14300	the one the participants are in. Sometimes  the  latter  becomes  the
14400	former.  Participants  in  dialogues  have  intentions and   dialogue
14500	algorithms must take this into account.  The doctor's intention is to
14600	gather  certain kinds of information while the patient's intention is
14700	to give information and get help.    A job is to be done; it  is  not
14800	small talk.   Our working hypothesis is that each participant in  the
14900	dialogue    understands    the    other    by    matching    selected
15000	idiosyncratically-  significant  patterns  in   the   input   against
15100	conceptual  patterns which contain information about the situation or
15200	event being described linguistically.              This understanding
15300	is   communicated   reciprocally   by   linguistic  responses  judged
15400	appropriate to the intentions and expectations  of  the  participants
15500	and  to  the  requirements  of  the situation. In this paper we shall
15600	describe only the  input-analyzing    processes  used  to  extract  a
15700	pattern  from  natural  language  input.  In a later communication we
15800	shall  describe  the  inferential  processes  carried  out   at   the
15900	conceptual  level once a pattern has been received by memory from the
16000	input-analysing processes.
16100	
16200		Studies  of  our  1971 model of paranoia (PARRY) indicated
16300	that about thirty percent of the sentences were not understood at all
16400	,  that is, no concept in the sentence was recognized.  In a somewhat
16500	larger number of cases some concepts, but not all,  were  recognized.
16600	In  many cases these partially recognized sentences lead to a partial
16700	understanding that was sufficient to  gather  the  intention  of  the
16800	speaker  and  thus to  output an   appropriate   response.   However,
16900	misunderstandings occurred too often. For example:
17000	
17100			DOCTOR: How old is your mother ?
17200	
17300			PARRY: Twenty-eight
17400	
17500	PARRY  has  interpreted  the question as referring to his own age and
17600	answered by giving his age.  The purpose of our new language analysis
17700	system  is  to  significantly  raise  the  level  of understanding by
17800	preventing such misunderstandings while not restricting what  can  be
17900	said  to  PARRY.  We do not expect complete under- standing from this
18000	system -- even native speakers of  the  language  do  not  completely
18100	understand the language. (e.g. To be, or not to be ...)
18200	
18300		By `understanding' we mean the system can do some or all of the 
18400	following:
18500	
18600			1) Determine the intention of the interviewer in making 
18700				this particular utterance.
18800	
18900			2) Make common logical deductions that follow from his
19000				utterance
19100	
19200			3) Form an internal representation of the utterance so
19300				that questions may be answered, commands carried
19400				out, or data added to memory.
19500	
19600			4) Determine references for pronouns, and other anaphora.
19700	
19800			5) Deduce the tone of the utterance,i.e., hostile,
19900	                      insulting...
20000	
20100			6) Classify the input as a question, rejoinder,command, ...
20200	
20300		The  approach  we  are  taking  consists  of merging the best
20400	features of pattern directed  systems  such  as  the  MAD  DOCTOR[2],
20500	ELIZA[9]  and  parsing  directed  systems for example, Winograd [11],
20600	Woods[12]. The programs to accomplish this are written in MLISP2,  an
20700	extensible  version  of the programming language MLISP[5,8], and uses
20800	an interpreted version of the pattern  matcher  designed  for  a  new
20900	programming language LISP70.
21000	
21100		The  following  is a basic description of the pattern matcher.
21200	We shall illustrate its operation using examples of
21300	problems common to teletyped psychiatric dialogues.
21400	
21500	
21600	
21700			PATTERN MATCHING
21800	
21900	
22000		Pattern  directed computation involves two kind of operations
22100	on data structures: decomposition and  recomposition.   Decomposition
22200	breaks down an input stream into components under the direction of a
22300	decompostion pattern ("dec").  The inverse  operation,  recomposition,
22400	constructs an output stream under the direction of a recomposition
22500	pattern ("rec").
22600	
22700	A rewrite rules is of the form:
22800	
22900			dec →  rec
23000	
23100	It  defines  a  partial  function on streams as follows: if the input
23200	stream matches the dec, then the output stream is  generated  by  the
23300	rec. The following rule (given as an example only) could be part of a
23400	question answering function:
23500	
23600		How are you ? → Very well and you ?
23700	
23800	If the input stream consists of the four tokens:
23900		
24000				How are you ?
24100	
24200	the output stream will consist of the five tokens:
24300			     
24400			   Very well and you ?
24500	
24600	
24700	REWRITE FUNCTIONS
24800	
24900		A rewrite rule defines a partial function, for example,
25000	the mapping of some particular token into some other particular
25100	token.  A broader partial function can be defined as the union of 
25200	several rewrite rules.  A rewrite function definition is of the
25300	form:
25400	
25500		<name> = dec1 → rec1
25600			 dec2 → rec2
25700			 ...
25800			 decn → recn
25900	
26000	VARIABLES	
26100	
26200		A function is difficult to define if every case must be
26300	enumerated.  Therefore, rewrite rules allow variables to appear
26400	in patterns.  The value of a variable can be either a list or an
26500	atom.  In this paper the notation:
26600	
26700				:X
26800	
26900	where X ia any identifier, will denote the variable X.  The
27000	variables of each rule are distinct from the variables of all 
27100	other rules, even if their names are the same.
27200	
27300		The following definition has only three rewrite rules,
27400	but handles an unlimited number of input streams:
27500	
27600		<REPLY> = HOW ARE YOU ? → VERY WELL AND YOU?
27700			= HOW IS :X → I HAVEN'T SEEN :X , LATELY.
27800			= DID :X GO TO :Y ? → WHY DON'T YOU ASK :X YOURSELF ?
27900	
28000	A variable can appear more than once in a single dec pattern, but
28100	it must match identical items at each appearance.  Example:
28200	
28300		<EQUAL> = (EQUAL :X :X) → TRUE
28400	
28500	ELLIPSIS
28600	
28700		To make patterns easier to read and write, the ellipsis
28800	symbol ... can be used to stand for an unnamed variable.  Thus:
28900	
29000		IS ... COMING → NO, ... COULD NOT MAKE IT.
29100	
29200	If an ellipsis (...) occurs several times on a side, it designates a 
29300	different variable each time.  The n'th ellipsis in a dec designates
29400	the same variable as the n'th ellipsis in the rec.
29500	
29600	AUTOMATIC ORDERING OF RULES
29700	
29800		The order of rules in a function definition does not  specify
29900	the  order  in  which  the  system  will attempt to apply them.  This
30000	ordering operation is handled by a special system ordering  function.
30100	Consider the rewrite function:
30200	
30300		<REPLY> = I SEE :X → SO WHAT ?
30400			= I SEE ANN → WOW !
30500	
30600	Both rules would match:
30700		
30800			I SEE ANN
30900	
31000	In such cases the more specific rule takes precedence.  Thus, given:
31100	
31200			I SEE ANN
31300	
31400	as the input stream , the output stream would be:
31500	
31600			WOW !
31700	
31800	but given:
31900	
32000			I SEE STARS
32100	
32200	the output stream would be:
32300	
32400			SO WHAT ?
32500	
32600	A literal is more specific than a variable. A variable appearing  for
32700	the  second  time  is more specific than a variable appearing for the
32800	first time in a dec.  This is so because the second occurence of  the
32900	variable  must  match  the  same  pattern as the first occurence. The
33000	precedence function is itself written in  rewrites  and  so  is  both
33100	extendable  and  changable  by  the  user.   Currently  precedence is
33200	calculated by a left to right  application  of  the  above  criteria.
33300	Therefore, the following function defines the LISP function EQUAL:
33400	
33500		<EQUAL> = (EQUAL :X :X) → TRUE
33600			= (EQUAL :X :Y) → FALSE
33700	
33800	SEGMENTS
33900	
34000		Sometimes  it  is  desirable  for  a  variable  to  match  an
34100	indeterminate number of items.  This is notated:
34200	
34300					::X
34400	
34500	Use of the double-colon ("::") means that the variable (e.g., X)
34600	will match zero or more items.  Example:
34700	
34800		<APPEND> = (APPEND (::X)(::Y)) → (::X ::Y)
34900	
35000	or if the input stream were:
35100	
35200		(APPEND (A B) (C D E)) 
35300	
35400	the output stream would be:
35500	
35600		(A B C D E)
35700	
35800	For increased readability the rule could also be written:
35900	
36000		<APPEND> = (APPEND (...) (...)) → (... ...)
36100	
36200	Another example:
36300	
36400		<REPLY> = WHERE DID ::X GO → ::X WENT HOME.
36500	Therefore,
36600	
36700		WHERE DID THE CARPENTER GO → THE CARPENTER WENT HOME.
36800	
36900	APPLICATION
37000	
37100		One of the main deficiencies of the system in which the
37200	MAD DOCTOR was programmed was its lack of adequate subroutining
37300	capability.  Subroutines may be indicated in the rewrite system
37400	as follows:
37500	
37600		<LAST> = () → ()
37700		       = (:X) → :X
37800		       = (:X ...) → (...)@LAST
37900	
38000	The "@" following an item means that the current input stream  is  to
38100	be pushed down, that the function indicated is to be entered with the
38200	item as its input stream, and that the output stream is to be  placed
38300	into  the restored current input stream.  When the called function is
38400	the current function a "*" may be used as an abbreviation. Example:
38500	
38600		<LAST> = () → ()
38700		       = (:X) → :X
38800		       =(:X ... ) → (...)*
38900	
39000	Note that MLISP2 functions may be called as well as rewrite functions.
39100	
39200	GOALS
39300	
39400		To gain the advantage of goal directed pattern matching and
39500	computing, the following form may be used:
39600	
39700		<PREPOSITIONAL_PHRASE> = <PREPOSITION>:P <NOUN_PHRASE>:N
39800			→ (PREP_PH :P :N)
39900	
40000	The identifer between the angled brackets ("<>") names a rewrite
40100	function the rules of which are to be matched against the input
40200	stream.  When a match occurs the output stream of the goal will
40300	be bound to the associated variable. Example:
40400	
40500		<PREPOSITIONAL_PHRASE> = <PREPOSITION>:P <NOUN_PHRASE>:N
40600			→ (PREP_PH :P :N)
40700	
40800		<NOUN_PHRASE> = TOWN → (NOUN_PH TOWN)
40900			      = PALO ALTO → (NOUN_PH PALO_ALTO)
41000	
41100		<PREPOSITION> = IN → IN
41200			      = ON → ON
41300	
41400	and the input stream:
41500	
41600			IN PALO ALTO
41700	
41800	the output stream would be:
41900	
42000		(PREP_PH IN (NOUN_PH PALO_ALTO))
42100	
42200	OPTIONALS
42300	
42400		Many other shorthands exist to simplify writing rules.
42500	One useful feature that will be mentioned here is the optional.
42600	
42700		<AUXILARY_PHRASE> = <AUXILARY>:A [<NEGATIVE>:N]:N1
42800			→ (AUX_PH :A [:N]:N1 )
42900	
43000	If  the  optional pattern, enclosed in square brackets ("[]"), occurs
43100	in the input stream it will be bound to :N.  :N1  will  be  bound  to
43200	TRUE.   If the <NEGATIVE> does not occur, :N1 will be bound to FALSE.
43300	On the rec side of the rules if :N1 is TRUE then :N will be placed in
43400	the  output  stream.    If  it is FALSE then nothing is placed in the
43500	output stream at that point.  Example, given the rule above:
43600	
43700			DO → (AUX_PH DO)
43800		    DO NOT → (AUX_PH DO NOT)
43900	
44000	MORE EXAMPLES
44100	
44200		We  have  collected  a  large  number  of dialogues using our
44300	previous program  PARRY.   These  dialogues  form  a  large  body  of
44400	examples  of the kind of English which we can expect. Martin Frost, a
44500	graduate  student  in  Computer  Science,  Stanford  University,  has
44600	written  a  keyword  in  context  program which enables us to isolate
44700	examples centered on particular words so that uses of thoses words in
44800	context  become  more  apparent.  Our  general approach is to build a
44900	system which can produce desired intreptations  from  these  examples
45000	and  to incrementally add to the rules in the system as new cases are
45100	discovered during the running of the program.
45200	
45300	Following are some examples of commonly occuring situations and
45400	examples of the kind of rules we use to handle them.
45500	
45600	QUESTION INTRODUCER
45700	
45800		In  doctor-patient dialogues  it is quite common to introduce a
45900	question by the use of  a  command.   The  "question  introducer"  is
46000	followed  by  either a <NOUN_PHRASE> or a <DECLARATIVE_SENTENCE>. For
46100	example,
46200	
46300		COULD YOU TELL ME YOUR NAME?
46400	
46500	Rather than attempt a literal analysis of this question, which might
46600	lead to the interpretation:
46700	
46800		DO YOU HAVE THE ABILITY TO SPEAK YOUR NAME TO ME?
46900	
47000	we utilize rules like:
47100	
47200		<SENTENCE> = <QUESTION_INTRODUCER>:Q <NOUN_PHRASE>:N
47300				→ (IS :N *?* )
47400	
47500		<QUESTION_INTRODUCER> = COULD YOU TELL ME →
47600				      =	WOULD YOU TELL ME →
47700				      =	PLEASE TELL ME →
47800	
47900		Although  it is conceivable that there are an infinite number
48000	of ways to introduce a question in this manner, we  have  found  only
48100	about  six  literal  strings  are  actually  used in our data base of
48200	dialogues.  When we discover a new  string  we  incrementally  add  a
48300	rule.  When we have enough examples to dectect a more general form we
48400	replace the rules for <QUESTION_INTRODUCER> by  a  more  elegant  and
48500	general  formulation.  This  approach  allows us to process dialogues
48600	before  we  have  a  complete  analysis  of  all  possible   sentence
48700	constructions, and it allows us to build a language analyzer based on
48800	actually occurring forms.
48900		Notice  that  is  possible  to  make  more  than one possible
49000	analysis of any given sentence depending on what is being looked for.
49100	A  poet  might  be interested in the number of syllables per word and
49200	the patterns of stress.  A "full" analysis of English must allow  for
49300	this  possibility,  but it it clearly foolish to produce this kind of
49400	analysis for PARRY.  Our analysis will be partial  and  idiosyncratic
49500	to the needs of our program.
49600	
49700	FILLERS
49800		It  is  quite  common  for interviewers to introduce words of
49900	little significance to PARRY into the sentence. For example:
50000	
50100		WELL, WHAT IS YOUR NAME?
50200	
50300	The "well" in this sentence serves no purpose  in  PARRY's  analysis,
50400	although  it  might to a linguist interested in hesitation phenomena.
50500	These fillers can be ignored.  The following rules accomplish this:
50600	
50700		<SENTENCE> = <FILLER>:F <SENTENCE>:S → :S
50800	
50900		<FILLERS> = WELL
51000			  = OK
51100	
51200	PUNCTUATION
51300	
51400		Interviewers use little intra-sentence punctuation in talking
51500	to PARRY.  When it is used it is often to seperate phrases that might
51600	otherwise be ambiguous.  Example:
51700	
51800		WHY WEREN'T YOU VERY CLOSE, FRANK
51900	
52000	Here  the  comma  clearly  puts  "CLOSE"  in  a different phrase from
52100	"FRANK". Punctuation,  when  used  in  PARRY's  rules,  is  generally
52200	enclosed  in  optional  brackets  ("[]").   This  has  the  effect of
52300	seperating phrases when punctuation is used, but not  requiring  full
52400	punctuation for the system to work. Example:
52500	
52600	  <SENTENCE> = <SENTENCE>:S1 [ ,]:C <SENTENCE_CONNECTOR>:SC 
52700	                <SENTENCE>:S2
52800	
52900	CLICHES AND IDIOMS
53000	
53100		The  English we encounter in doctor-patient dialogues is made
53200	up of a great number of cliches and idioms, therefore we anticipate a
53300	large number of rules devoted to them.  For example:
53400	
53500		<TIME_PHRASES> = A COUPLE OF <TIME_UNIT>:T AGO
53600			→ (TIME (RELATIVE PAST)(REF PRESENT) :T)
53700	
53800		<TIME_UNIT> = SECONDS → (WITHIN CONVERSATION)
53900			    = MOMENTS → (WITHIN CONVERSATION)
54000			    = DAYS → (BEFORE CONVERSATION DAYS)
54100	
54200	REPRESENTATION CORRECTION
54300	
54400		Intermediate results are often produced which are  misleading
54500	in  meaning  or  are  in  the  wrong form for further processing. We,
54600	therefore, incorporate at various points rules which  detect  certain
54700	undesired  intermediate results and convert them to the desired form.
54800	Example:
54900	
55000		<CORRECT_FORM> = (QUESTION ... (SENTENCE ...)) →
55100			(QUESTION ... ...)
55200	
55300	UNKNOWN WORDS
55400	
55500		Rules can be derived to handle words  which  were  previously
55600	unknown to the system.  For example:
55700	
55800		<UNKNOWN_WORD> = DR. :X → (NAME :X)@NEW_WORD
55900					  (DOCTOR :X)@NEW_WORD
56000			       = THE :X <VERB_PHRASE>:V →(NOUN :X)@NEW_WORD
56100			       = I :X YOU → (VERB :X)@NEW_WORD
56200	
56300	Here "NEW_WORD" is a function which adds new words to the dictionary.
56400	
56500			CONCLUSION
56600	
56700		We are faced with the problems of natural language being used
56800	to interview people in a doctor-patient context.  We have developed a
56900	language processing system which we believe is capable of  performing
57000	in  these interviews at a significantly improved level of performance
57100	compared to systems used in the past.  We have  developed  techniques
57200	which  can measure performance in comparison with the ideal of a real
57300	human patient in the same context.[3,4]. We are designing our  system
57400	with  the  realization that a long period of development is necessary
57500	to reach desired levels of performance. This is  a  system  that  can
57600	work at a measured level of
57700	
57800	performance  and  be improved over time with new rules having minimum
57900	interaction with those already existing.  Our system is  designed  so
58000	that  a  complete analysis of every word or phrase of an utterance is
58100	not neceesary.
58200		The  basis of this system is a rewrite interpreter which will
58300	automatically merge new rules into the set of already existing  rules
58400	so that the system will continue to handle sentences which it handled
58500	in the past.
58600	
58700			REFERENCES
58800	
58900	[1]  Colby,  K.M.,  Watt,J.and  Gilbert,J.P.  A  computer  method  of
59000	psychotherapy.       Journal   of   Nervous   and   Mental   Disease,
59100	142,148-152,1966.
59200	[2] Colby,K.M. and Enea,H. Heuristic methods for computer understanding
59300	    of natural language in context restricted on-line dialogues. 
59400	    Mathematical Biosciences,1,1-25,1967.
59500	[3]  Colby,K.M.,  Hilf,F.D.,Weber,  S.,and   Kraener,H.   Turing-like
59600	indistinguishability   tests   for   the  validation  of  a  computer
59700	simulation       of       paranoid       processes.        Artificial
59800	Intelligence,3,199-221,1972.
59900	[4] Colby,K.M. and Hilf, F.D. How to use and how not to use Turing-like
60000	tests in evaluating the adequacy  of  simulation  models.  (See  this
60100	volume).  [5]  Enea,H.  MLISP,  Technical  report  no.  CS-92,  1968,
60200	Computer Science Department, Stanford University.
60300	[6] Schank, R.C., Tesler, L. and Weber,S. Spinoza ii: Conceptual
60400	    case-based natural language analysis. Memo AIM-109, 1970, Stanford
60500	    Artificial Intelligence Project, Stanford University.
60600	[7] Simmons, R.F. Some semantic structures for representing English
60700	    meanings. Preprint, 1970, Computer Science Department, University
60800	    of Texas, Austin.
60900	[8] Smith, D.C., MLISP, Memo AIM-135, 1970, Stanford Artificial
61000	    Intelligence Project, Stanford University.
61100	[9] Weizenbaum,J. Eliza- a computer program for the study of natural 
61200	    communication between man and machine. Communications of the ACM,
61300	    9,36-45,1966.
61400	[10] Wilks, Y.A. Understanding without proofs. (See this volume).
61500	[11] Winograd, T. A program for understanding natural language. 
61600	    Cognitive Psychology,3,1-191,1972.
61700	[12] Woods, W.A. Transition network grammars for natural language analysis.
61800	    Communications of the ACM,13,591-606,1970.